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ABSTRACT 

The spectral energy distribution of galaxies is a complex function of the star formation 
history and geometrical arrangement of stars and gas in galaxies. The computation 
of the radiative transfer of stellar radiation through the dust distribution is time- 
consuming. This aspect becomes unacceptable in particular when dealing with the 
predictions by semi-analytical galaxy formation models populating cosmological vol- 
umes, to be then compared with multi-wavelength surveys. Mainly for this aim, we 
have implemented an artificial neural network algorithm into the spectro-photometric 
and radiative transfer code GRASIL in order to compute the spectral energy distri- 
bution of galaxies in a short computing time. This allows to avoid the adoption of 
empirical templates that may have nothing to do with the mock galaxies output by 
models. The ANN has been implemented to compute the dust emission spectrum (the 
bottleneck of the computation), and separately for the star- forming molecular clouds 
and the diffuse dust (due to their different properties and dependencies). We have 
defined the input neurons effectively determining their emission, which means this 
implementation has a general applicability and is not linked to a particular galaxy 
formation model. We have trained the net for the disc and spherical geometries, and 
tested its performance to reproduce the SED of disc and starburst galaxies, as well 
as for a semi-analytical model for spheroidal galaxies. We have checked that for this 
model both the SEDs and the galaxy counts in the Herschel bands obtained with the 
ANN approximation are almost superimposed to the same quantities obtained with 
the full GRASIL. We conclude that this method appears robust and advantageous, and 
will present the application to a more complex SAM in another paper. 

Key words: radiative transfer - methods: numerical - galaxies: evolution - infrared: 
galaxies 



1 INTRODUCTION 

The spectral energy distribution (SED) of a galaxy contains 
a wealth of information, and through its study much can be 
learned about the galaxy's properties; including the stellar 
and gas content of the galaxy, the age and abundances of the 
stellar populations, the chemistry and physical state of the 
interstellar medium, and the star formation rate (SFR) and 
history. It is therefore the most direct probe to study galaxy 
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formation and evolution, both through direct observations 
and also by theoretical modelling. 

Different spectral ranges tend to be dominated by dif- 
ferent specific emission sources or radiative processes which 
affect the light as it travels through the interstellar medium 
(ISM). Therefore by analyzing and predicting the whole 
spectral range one can hope to deconvolve and interpret 
all the different information contained in the SED in terms 
of the SFR history and galaxy evolution in general. Stel- 
lar sources mainly emit in the UV/optical to NIR spectral 
range, and the SED in this wavelength region is therefore 
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heavily influenced by the star formation history of the galaxy 
and as a result can be used to study the specific mixture of 
ages, metallicity and mass distribution of the stellar popula- 
tions. UV photons ionize and excite the gas, producing HII 
regions with emission lines, that arc probes of the SFR and 
the chemistry, energetics and physical state of the ISM where 
they are produced. Atomic and molecular lines are present 
from the X-ray to the radio range originating from elec- 
tronic or rotational/vibrational transitions (e.g. Stasinska 
2007) . The X-ray range probes mainly the emission from hot 
plasma and from X-ray binary stars (e.g. Fabbiano 2006). 
The radio continuum emission is mainly produced by free- 
free emission from ionized nebulae and synchrotron radia- 
tion by energetic electrons accelerated in supernova remnant 
shocks and moving in the galactic magnetic field (e.g. Con- 
don 1992). The SED from a few /im to the sub- mm (the 
IR region) is dominated by the interaction of dust grains 
with stellar radiation. Dust in galaxies, although only a 
small fraction of the mass of gas (~ 0.01 in our Galaxy), is 
a fundamental ingredient prevalent in many environments, 
such as circumstellar envelopes, supernova remnants, star- 
forming regions and diffuse clouds. Dust grains absorb and 
scatter short wavelength stellar radiation (A ^ 1/xm) with 

high efficiency and thermally emit the absorbed energy in 
the IR. In addition to its effect on the SED, dust grains 

also affect many important chemical and physical processes, 
for instance by acting as a catalyst for the formation of H2 
molecules, by shielding dense and cold regions from photo- 
dissociating UV photons allowing gravitational collapse and 
star formation, by driving mass loss in evolved stars, and by 
depleting heavy elements from the gas phase (e.g. Mathis 
1990; Dorschner & Henning 1995; Draine 2003). 

The modelling of the entire SEDs of galaxies is there- 
fore very complex and full of uncertainties. Because of this, 
several different approaches have been proposed, depend- 
ing also on the purpose of the applications. Some works 
(e.g. Devriendt, Guiderdoni & Sadat 1999; Chary & El- 
baz 2001; Dale ot al. 2001; Dale & Helou 2002; GalUano 
et al. 2003; Lagache, Dole & Puget 2003; Da Cunha, Char- 
lot & Elbaz 2008) have proposed semi-empirical treatments 
of the SEDs. The aim of these works is in general to inter- 
pret very large samples of data, requiring fast computing 
times making use of observationally or physically motivated 
SEDs. Other works are based on theoretical computations in 
order to have a more general applicability in terms of inter- 
pretative and predictive power. Within this approach differ- 
ent components and levels of complexity have been consid- 
ered. Several papers deal with the radiative transfer (RT) 
in spherical geometries, mainly aimed at modelling star- 
burst galaxies (e.g. Rowan- Robinson 1980; Rowan- Robinson 
& Crawford 1989; Efstathiou, Rowan- Robinson & Sieben- 
morgen 2000; Efstathiou & Rowan-Robinson 2003; Takagi, 
Arimoto & Hanami 2003; Takagi, Vansovicius & Arimoto 
2003; Siebenmorgen & Krugel 2007). In the series of papers 
by Dopita et al. (2005, 2006a,b) and Groves et al. (2008) 
a sophisticated modelling of the SED of starburst galaxies 
have been presented, that includes the evolution of stellar 
populations, the dynamical evolution of HII regions and con- 
tinuum and line emission. The series of papers by Popescu et 
al. (2000), Misiriotis et al. (2001), Tuffs et al. (2004), MoUen- 
hoff, Popescu & Tuffs (2006) are focussed on a detailed mod- 



elling and inteprctation of the SED of spiral galaxies, from 
the UV to sub-mm, to provide constraints for several quan- 
tities such as optical depths, attenuations, scale radii for the 
distribution of stars and dust. The most general treatments 
of RT, capable of dealing with arbitrary geometrical config- 
urations, are based on Monte Carlo codes (e.g. Bianchi et 
al. 1996; Gordon et al. 2001; Baes et al. 2003; Chakrabarti 
et al. 2008; Li et al. 2008). Among these, the code DIRTY 
(Gordon et al. 2001; Misselt et al. 2001) includes extinction 
and dust emission and clumping of dust; SUNRISE (.Jons- 
son 2006; Jonsson & Primack 2010; Jonsson, Groves & Cox 
2010) computes extinction and dust thermal emission and 
has been applied to hydrodynamical simulations of spirals 
including as sub-grid the HII region models by Groves et al. 
(2008): the code TRADING (Bianchi 2007, 2008) includes both 
extinction and dust thermal emission, the clumping of gas 
and stars, and has been applied to study images of spirals. 
The drawback of Monte Carlo codes is the very long com- 
puting times they require, which becomes prohibitive when 
for instance applied to galaxy formation models in cosmo- 
logical volumes, where typically mock catalogues of many 
thousands or tens of thousands of galaxies are necessary to 
compare with observational constraints, for example multi- 
wavelength luminosity functions (LF) and galaxy counts. 

For a general purpose modelling of galaxy SEDs we de- 
veloped the code GRASIL (Silva et al. 1998 [S98]; Silva 1999 
[S99]; Granato et al. 2000 [GOO]; Bressan, Silva & Granato 
2002; Silva et al. 2001; Panuzzo et al. 2003; Vega ot al. 2005; 
Schurer et al. 2009). Our main aims were to construct a rel- 
atively realistic and flexible multi-wavelength model, which 
could calculate a galactic SED in a reasonably short comput- 
ing time, to be applied both to interpret observations and 
to make predictions in conjunction with galaxy formation 
models. These requirements heavily influenced our general 
choices, promoting the decision to include a realistic bulge 
plus disk geometry, the radiative effects of different dusty 
environments and the clumping of stars and dust, but to 
avoid Monte Carlo calculations and to have some degree of 
geometrical (axial and equatorial) symmetry. With these in- 
gredients the model has been successfully applied in many 
contexts (e.g. Granato et al. 2004; Baugli ct al. 2005; Silva 
et al. 2005; Panuzzo et al. 2007a,b; Iglcsias-Paramo et al. 
2007; Fontanot et al. 2007, 2009; GaUiano et al. 2008; Vega 
et al. 2008; Lacey et al. 2008, 2010; Michalowski et al. 2009, 
2010; Schurer et al. 2009; Santini et al. 2010). 

The study of galaxy formation and evolution has been 
receiving increasing interest, both observationally and theo- 
retically. Observational programs covering the whole wave- 
length range are systematically and directly unveiling galaxy 
populations at all redshifts, whose main properties depend 
on the selection criteria. The detection of high redshift 
galaxy populations are particularly important to track the 
process of galaxy formation as a function of the cosmic 
epoch. Three main spectral ranges are used, each detecting 
galaxies with different masses, levels of star formation and at 
different evolutionary stages. IR and sub-mm data collected 
initially by IRAS (Neugebauer et al. 1984; Soifer et al. 1987), 
then mainly by COBE (Puget et al. 1996; Fixsen et al. 1998; 
Hauscr & Dwck 2001), ISO (Kcssler ct al. 1996; Genzcl & 
Cesarsky 2000), SCUBA (Holland et al. 1999; Small et al. 
1997, 2002; Hughes et al. 1998) and more recently by Spitzer 
(Werner et al. 2004; Soifer, Helou & Werner 2008) have 
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shown that at least half of the energy emitted by stars over 
the history of the Universe has been reprocessed by dust in 
the IR, with a high-z star formation activity much stronger 
than locally, as witnessed by the fast evolution of the pop- 
ulation of IR-bright galaxies discovered with the mid-IR to 
mm cosmological surveys (see e.g. the review by Lagache, 
Puget & Dole 2005). At shorter wavelengths, large pop- 
ulations of high-z star-forming galaxies, the Lyman-break 
galaxies (LBGs), have been detected in the optical bands 
from their stellar omission in the rest-frame UV, exploiting 
the spectral break around 912 A produced by absorption 
by intervening neutral hydrogen (e.g. Giavalisco 2002). The 
dust extinction corrections required to provide an estimate 
for their SFR (~ 10 to 100 Mq /yr) arc probably very large (a 
factor ~ 3 — 10, e.g. Meurer, Heckmau & Calzetti 1999), but 
remain very uncertain since at least part of the SFR could be 
optically totally hidden. Deep near-IR surveys and estimates 
of stellar mass functions have revealed a substantial popula- 
tion of already massive, and in many cases already evolved, 
galaxies at z > 1 (e.g. Drory et al. 2003, 2005; Cimatti et 
al. 2004; Fontana et al. 2004; Bundy et al. 2005; Saracco et 
al. 2005; Caputi et al. 2006). These observations reveal that 
the most massive galaxies tend to be the oldest at all the 
sampled redshifts, i.e. the high-luminosity/high-mass tails 
of the luminosity /mass functions arc found to evolve only 
weakly since z ~ 5 to now (e.g. Cimatti, Daddi & Renzini 
2006). 

From the theoretical point of view, the modelling of 
galaxy formation and evolution in a cosmological context 
involves many processes at very different scales, from Mpcs 
to a pc and under. The widest range of observed galaxy prop- 
erties have been analyzed using the so-called semi-analytic 
models (SAM; White & Rees 1978; Lacey & Silk 1991; White 
& Frenk 1991), that consist in calculating the evolution of 
the baryon component using simple analytical approxima- 
tions, while the evolution of the dark matter is calculated 
directly using gravity-only N-body simulations, or Monte 
Carlo techniques based on the extended Press Schechter the- 
ory (Lacey & Cole 1993). 

The final step to get the output simulated galaxy cata- 
logues which can be compared to observations, is the com- 
putation of the full wavelength range SED for each mock 
galaxy. This should be calculated by appropriately taking 
into account for each galaxy its particular star formation and 
metallicity history and geometrical arrangement of the stel- 
lar populations and of the ISM, as predicted by the model. 
The simulated SED catalogue can then be compared to real 
observed galaxy surveys, so as to check whether the predic- 
tions are or arc not representative of the real universe and 
to retrieve some information on the galaxy formation pro- 
cess. In principle, the most general way to proceed would be 
to use a model which allows any geometrical configuration 
for the distribution of stars and ISM, such as a full Monte 
Carlo radiative transfer code. However, this is in practice 
not feasible, because of unacceptable computing times, nor 
necessary since SAM themselves lack detailed geometrical 
information about the simulated galaxies. In fact, radiative 
transfer Monte Carlo codes at present are only used coupled 
with hydro-simulations of single galaxies, not for cosmolog- 
ical applications (e.g. Chakrabarti et al. 2008; Rocha et al. 
2008; Narayanan et al. 2010). Since the attempt to theoret- 
ically understand the assembly of baryons within the hier- 



archy of dark matter halos is inevitably subjected to strong 
uncertainties and degeneracies, as many observational con- 
straints as possible must be taken into account by models, 
in order to get some hints as to the overall scenario and pos- 
sibly the main physical processes involved. Therefore only a 
complete multi-wavelength analysis of galaxy data can be 
used to help unlock the complexities of galaxy formation 
and evolution. 

Most semi-analytical models have made use of simple 
empirical treatments to compute the SED (e.g. Guiderdoni 
et al. 1998; Kauffmann et al. 1999; Somerville & Primack 
1999; Hatton et al. 2003; Blaizot et al. 2004; Kang et al. 
2005; Kitzbichler & White 2007). The only SAMs that in- 
cludes a UV to sub-mm radiative transfer computed from 
first principle are GALFORM (Cole et al. 2000; GOO; Baugh 
et al 2005; Lacey et al. 2008, 2010; Swinbank et al. 2008), 
MORGANA (Modelling the Rise of GAlaxies aNd Active nuclei, 
Monaco ct al. 2007; Fontanot et al. 2007, 2009), and ABC 
(Anti-hierarchical Baryonic Collapse; Granato et al. 2004; 
Silva et al. 2005; Lapi et al. 2006). These models have been 
interfaced with GRASIL to make detailed comparisons and 
predictions in different spectral ranges. 

As previously mentioned, the GRASIL code has been 
written in order to calculate an accurate SED in a relatively 
quick time and this has allowed the model to be used ex- 
tensively for calculating the SEDs for the above mentioned 
SAMs. Despite this, the calculation of the IR SED by GRASIL 
is still often the bottleneck of the whole project and the 
computing time becomes prohibitive when considering the 
exploitation of large-scale structure simulations such as the 
Millennium Simulation (Springcl ct al. 2005), which would 
require millions of galaxy calculations. 

To improve on tlii.s. with the idea in particular for 
use with cosmological applications, we have implemented 
in GRASIL the possibility of computing SEDs with an Artifi- 
cial Neural Network (ANN) algorithm. This will reduce the 
computing time significantly without having to rely on unre- 
alistic template approaches or simplified analytical recipes. 
According to the required application one can choose the de- 
sired computational method: a full GRASIL calculation or the 
quicker ANN mode. The bottleneck within the GRASIL code 
is the computation of the cirrus and the molecular clouds 
dust emissions. It is therefore these two processes that the 
ANN will be applied to, with the option of using the ANN for 
either or both of the processes. Another interesting applica- 
tion made possible due to the improved performance of the 
ANN based computation would be the combination of this 
code with an algorithm which could automatically search 
the GRASIL parameter space in order to find optimized pa- 
rameters to fit real observed individual galaxy SEDs. In this 
first paper, the ANN for the emission from the diffuse dust 
has been implemented for two geometrical arrangements, 
pure disc or spherical distribution of stars and dust, and 
we test and apply it to cases suited for these geometries. 
In particular, as a practical sample application, we compute 
galaxy counts in the Herschel imaging bands for the ABC 
model for spheroidal galaxies. In an another paper, we will 
present the implementation of the ANN also for the mixed 
bulge+disc geometry, and apply it to more complex semi- 
analytical models (e.g. GALFORM). 

Almeida et al. (2010) have already used the ANN algo- 
rithm to insert the GALFORM-hGRASIL model into the Millen- 
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nium Simulation, to study the properties of the population 
of sub-mm galaxies. That method is complementary and 
substantially different from the one presented here. They 
identify the properties of the GALFORM galaxies which deter- 
mine, through an ANN, their GRASIL SEDs. The method is 
successful and extremely fast, since the ANN is used to com- 
pute the entire SED, not only the IR dust emission. However 
their implementation is very specific to GALFORM+GRASIL, 
and each realization of GALFORM requires a training of the 
ANN, one for each output redshift. The method we have 
implemented here is less fast but more general, because the 
input is directly linked to the galactic properties effectively 
determining the portion of the SED dominated by dust emis- 
sion (e.g optical depths, masses of dust, radiation field etc.). 
As a result, one single training is able to cover a large variety 
of applications. 

In Sec. [2] we recall the main properties of GRASIL and 
the latest updates; in Sec. [3] we provide some generalities 
on ANN, and in Sec. 13.21 we describe the implementation 
of ANN in GRASIL, the choice of the input neurons and the 
definition of the trained nets; in Sec. |4]we show some appli- 
cations and examples. Finally our conclusions are presented 
in Sec. [5] 



2 MODELLING SEDS WITH GRASIL 
2.1 General description 

GRASIL (GRAphite & SILicate) is a code constructed to com- 
pute the SED of galaxies from the far-UV to the radio wave- 
length range, treating with particular care the effects of dust 
reprocessing on the stellar radiation and including contin- 
uum and line emission. In this Section we provide a sum- 
mary of its principal characteristics, that we will need be- 
low to introduce the implementation of ANN. We refer to 
the original papers for more details (in particular S98; S99; 
GOO; Panuzzo et al. 2003; Vega et al. 2005). 

The main aims of GRASIL are to provide a relatively re- 
alistic and flexible modelling of galaxy SEDs, together with 
an acceptable (for most applications) computing time. These 
requirements are reflected in its main features: 

• Galaxies are represented with stars and dust distributed 
in a bulge and/or a disc, adopting respectively a King and a 
double exponential proflle (see e.g. Fig. 2.7 in S99 or Fig. 1 
in GOO for a schematic representation of the geometry and 
components). 

• Three different dusty environments and their corre- 
sponding interaction with stars are considered: the star- 
forming molecular clouds (MCs) associated with newly-born 
stars, the diffuse medium ("cirrus") associated with more 
evolved stars, and the dusty envelopes around AGB stars 
(intermediate age stars), their relative contribution to the 
SED depending on the star formation history. 

• The birth of stars within MCs and their gradual disper- 
sion into the diffuse medium is accounted for by decreasing 
the fraction of energy stars emit within MCs with increasing 
age over a typical "escape timescale" (Sec. 2.5 and Eq. 8 in 
S98 for more details). Therefore we account for the clump- 
ing of (young) stars and dust within a diffuse medium, and 
for a greater attenuation suffered by the youngest stars, this 



means the attenuation is age-dependent (e.g. GOO, in par- 
ticular their Fig. 11, and Panuzzo et al. 2007a). 

• The dust model is made of graphite and silicate spher- 
ical grains with a continuous size distribution including 
grains in thermal equilibrium with the radiation fleld, very 
small grains fluctuating in temperature due to the absorp- 
tion of single UV photons, and PAH molecules (optical prop- 
erties by Draine & Lee 1984; Laor & Draine 1993; Li & 
Draine 2001; Draine & Li 2007). We compute the response 
to the incident radiation fleld for each type of grain. 

• The RT is exactly solved for the MCs with the Granato 
(1994, 1997) code originally developed for AGN torii, imple- 
menting the A-iteration algorithm. These are represented 
as spherically symmetric clouds with the stars as a central 
point source (see Sec. 2.5.1 in S98 for a discussion on this as- 
sumption). Star forming MCs typically have extremely high 
optical depths even in the IR, which means IR-produced 
photons are self-absorbed, thus requiring a full RT treat- 
ment. Moreover, the youngest massive stars still embedded 
in MCs are also those emitting more strongly in the UV 
where the dust opacity is the highest. 

• The model galaxy is binned in appropriately small vol- 
ume elements. The radiation fleld is evaluated in each of 
them from the knowledge of the distribution of stars and 
dust. Consequently the local dust emission and the attenu- 
ated radiation along each desired line of sight is computed. 
The treatment of the RT and dust emission in the diffuse 
phase (the real bottleneck of the whole computation) is ap- 
proximated (see Section 2.5.2 in S98 and 2.5.3 in S99). 

• Our reference library of SSPs is from Bressan et al. 
(1998, 2002). We recall that the effects of the dusty envelopes 
around AGB stars and the radio emission (both thermal and 
non-thermal) are directly included in these SSPs. But any 
desired SSP library can be given in input to GRASIL (e.g. 
Fontanot & Monaco 2010 tested the effects of both Bressan 
et al. and Maraston (2005) SSPs in MDRGANA+GRASIL). 

• The output consists of a UV to radio SED. The maxi- 
mum resolution is set by that of the input SSPs for the UV 
to NIR, while the wavelength points necessary to well deflne 
the dust features are in any case set by the code. In addition 
to the continuum and dust features, it is possible to include 
the computation of the nebular emission lines as described 
in Panuzzo et al. (2003). 



It is worth noting that although the flrst release of GRASIL 
was more than 10 years ago, it is still the state of the art 
in the fleld. In fact the basic problem of RT remains to 
find a compromise between computing time and choice of 
approximations, depending on the purpose. We have car- 
ried out over the years several improvements, mostly follow- 
ing observational progress. In particular, the emission bands 
from PAHs have been updated with respect to S98, which 
were based on pre-ISO (Infrared Space Observatory) obser- 
vations, and Vega et al. (2005), based on ISO observations 
(Li & Draine 2001), by adopting at present the absorption 
cross sections and band profiles by Draine & Li (2007) . This 
last update has been driven by the availability of Spitzer 
data. 
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2.2 Inputs 

The inputs required by GRASIL consist of the star formation, 
gas and metalhcity evolution histories, and a set of geomet- 
rical parameters. 

The former ingredients can be provided by analytical 
star formation laws, or by "classical" chemical evolution 
models, or by more complex galaxy formation models (e.g. 
GALFORM, MORGANA, ABC, see the Introduction). Our reference 
code for generating star formation histories, CHE_EVO, is do- 
scribed in S99 (see also Section 2.2 in Fontanot et al. 2009), 
and will be used below to generate the libraries to train the 
ANN. It computes the evolution of the SFR, mass of gas, 
metallicity, and of the chemical elements given an IMF and 
a SF law of the kind SFR{t) = v Mgas{tf + /(i), where the 
first term is a Schmidt-type SF with efficiency v, and fit) 
is an analytical term. Note however that our approach is in- 
dependent of this choice, and indeed our aim is to compute 
ANNs that can work with any engine to generate the SF 
histories. 

The other inputs required by GRASIL, some of which can 
be provided by galaxy formation models, are: 

• JmC- mass gas fraction in star forming MCs. It affects 
mainly the FIR-submm. 

• tmc' optical depth of MCs. It affects strongly the mid- 
IR and the silicate absorption feature. 

• tesc: escape time scale of young stars from the MCs. It 
affects mainly the IR to UV-optical ratio. 

• 5 = Mdust/Mgas- dust to gas mass ratio. It is custom- 
ary to set it either to a fixed value or proportional to the 
metallicity, unless provided by a dust evolution model (e.g. 
as in Schurer et al. 2009). 

• Bulge scale-radii (core radii of the King profile) and disk 
scale-radii and heights (of the double exponential) for stars 
and dust. The distribution of the radiation field relative to 
the dust determines the dust temperature distribution. 



3 COMPUTING SEDS WITH ARTIFICIAL 
NEURAL NETWORKS 

3.1 Basic concepts of ANN 

ANN were first introduced as very simplified models of the 
brain behavior (McCulloch & Pitts 1943; Rosenblatt 1958), 
mathematical models able to learn from examples and data. 
They proved very useful in tackling many computation- 
ally complex problems, generally non-linear, such as pattern 
recognition, classification and function approximation. They 
are now widely used in all scientific areas, for instance in 
biochemistry, neuroscience, computer science, mathematics, 
finance, physics as well as in astrophysics. The architecture 
of ANNs reflects in someway the biological brain, in that 
they consists of processing units (neurons) with multiple 
connections organized as a network and working in paral- 
lel. These connections have adaptable strengths (synaptic 
weights) which modify the signal transmitted to and from 
each neuron. But in practice ANN can be considered pow- 
erful data modelling tools with different possible implemen- 
tations to address different problems. Their ability to learn, 
generalization and adaptability offer several advantages over 
other data mining and analysis tools. 



The working of ANNs is defined by their architecture, 
propagation rule and learning algorithm: 

• Network Architecture: The architecture or topology 
of ANNs refers to the pattern of connections between the 
computing units and the propagation of data. It can be split 
into 2 main classes, the feed-forward (FF, the kind of ANN 
we used) and the feedback or recurrent ANNs. In the FF 
case, the information moves only in the forward direction 
from the input to the output neurons. Recurrent networks 
contain feedback connections, cycles and loops. The neurons 
are commonly organized in layers, generally with an input 
layer, an output layer and in the more interesting case one 
or more hidden layers. In the FF pattern, each layer consists 
of units which receive their inputs from a layer directly be- 
low and send their output to units in a layer directly above. 
There are no connections within a layer. The simple network 
which can be built with no hidden layers is commonly called 
a Perceptron (Rosenblatt 1958), which can be used only for 
linear applications. For more difficult tasks, it is necessary 
to have at least one hidden layer (multi-layer perceptrons, 
henceforth MLPs). In particular, the universal approxima- 
tion theorem (Haykin 1999) states that one single layer of 
hidden units suffices to approximate any function to arbi- 
trary precision, provided that the activation function (see 
the propagation rule) of the hidden layer is non-linear. In- 
deed, in our application we got satisfactory results with a 
single hidden layer. 

• Signal propagation rule: The basic working of the 
brain consists in neurons receiving electrochemical signals 
from other neurons, some of which excite the cell, while oth- 
ers inhibit it. The neurons add all these contributions and, 
if the sum is greater than a certain threshold, the neuron is 
activated, i.e. it transmits the signal further on. In analogy, 
each computational unit in the net receives a signal from 
all the neurons it is connected to, with the strength of each 
connection quantified by a weight. The unit multiplies each 
input signal by the weight of the corresponding connection 
and sums all the contributions. At least one non-linear acti- 
vation function f is operated on the total signal to give the 
output value that is then passed on as an input to the neu- 
rons in the following layers. In practice: Oj = f(^ Wjk ik) , 
where Oj is the output signal from the jth neuron, ik are the 
incoming signals from all the neurons connected to it, with 
corresponding weights Wju- Typical activation functions are 
the sigmoid, f{x) = , Gaussian f{x) = e~'-°^^ , and 
Elliot /(x) = ^. 

• Learning algorithms: There are 2 main methods for 
the learning or training of the ANNs, supervised and unsu- 
pervised nets. In the first case (our case) the net is trained 
with a given target, i.e. the ANN is taught that for a given 
input it has to provide a given output, and the net adapts its 
connections (weights) so as to produce the desired answer. 
In unsupervised learning the net does not have a target out- 
put. It is used to find patterns and group the data. 

There are several methods within the supervised learning, 
all of which consist on a comparison between the predicted 
output from the ANN with the target output. Our choice 
is the back-propagation (BP) algorithm (Rumelhart, Hinton 
& Williams 1986), which is the most widely used one. The 
errors are propagated backwards from the output nodes (di- 
rectly defined by the comparison between the predicted and 
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target values) to the inner nodes. This method is used to 
calculate the gradient of the error of the network with re- 
spect to the network's modifiable weights and thus to adjust 
the weights to find the (local) minimum of the error function 
with the gradient descent method. 

In GRASIL we have implemented a standard feed- 
forward multi-layer back-propagation network (e.g. 
Bishop 1995; Rojas 1996). 



layer, learning rate, maximum number of iterations) the uni- 
versal approximation theorem implies that this architecture 
is capable of universal optimal approximation. Moreover, it 
is found empirically that networks with multiple hidden lay- 
ers are more prone to getting caught in undesirable local 
minima (Haykin 1999; Bishop 1995). To create and use the 
trained net to predict SEDs we have adapted the F90 code 
by B. Fiedler freely available at \hitp://mensch. org/neural/\ 



3.2 Implementing ANN in GRASIL 

The prediction of galaxy spectra is a complex problem due to 
the high number of input and output variables, and the non- 
linearity among them. Neural networks represent a viable 
solution for this non-linear function prediction. 

The typical computing time to run GRASIL on a ~ 2 
GHz CPU ranges from ~ half a second for the no dust case 
(i.e. the standard pure stellar spectral synthesis which is not 
our interest here) to ~ 10 minutes for the bulge plus disc 
case. A pure bulge (i.e. spherical symmetry) requires Si 1 

minute, a pure disc Si 2 minutes, a combined bulge and 

disc geometry requires ~ 10 minutes (the exact value de- 
pending on the number of radial and angular grid points 
set according to the "compactness" of the model, see S98 
and S99 for details). Most time is spent computing the IR 
emission from dust, of which ~ 70 to > 90% is required 
by the emission from the cirrus component. This is because 
each volume element of the model in a 3D grid has its own 
radiation field and amount of dust, whose emission is cal- 
culated individually for each type of grain of the dust size 
distribution, including grains in thermal equilibrium with 
the radiation field, and small grains and PAH requiring the 
computation of a temperature probability distribution func- 
tion. Instead, computing the extinction of stellar radiation 
by the two dusty components of the ISM (molecular clouds 
and cirrus) is straightforward, ~ 1 second. 

These considerations drove our strategy to use an ANN 
algorithm to reconstruct only the IR emission from molecu- 
lar clouds and cirrus. To this aim, the feist calculation of the 
extinction by the molecular clouds and the cirrus provides 
the amount of energy absorbed and therefore the normal- 
ization for the two components, while we have implemented 
the ANN to predict their spectral shapes. Since the CPU 
time required to predict the IR emission with the ANN is 
negligible with respect to the direct computation, with this 
approach we can expect orders of magnitude gain in the time 
performance. 

Due to the very difi'erent nature and treatment of the 
MCs and cirrus components, the quantities (input neurons) 
that determine their respective IR SEDs are different. There- 
fore we have implemented a distinct ANN for each of them. 
As a byproduct, it is possible to run the code in the "ANN 
mode" for both components or for only one of them. In the 
latter case, the emission from the other component is com- 
puted in the "full" mode. 

We have implemented a standard feed-forward 
back-propagation MLP with one hidden layer, using a sig- 
moid activation function from the input to the hidden layer. 
Indeed by properly identifying the input neurons and setting 
the network parameters (number of neurons of the hidden 



3.3 Input and output neurons 

For each dusty component we have identified the physical 
quantities controlling their spectral shape. Their integrated 
luminosity, i.e. the normalization, is known from the direct 
computation of the amount of stellar energy absorbed by 
MCs and cirrus respectively. 

The identification of the input neurons is based on phys- 
ical expectations corroborated by working experience with 
GRASIL. As such, they are closely related to, but not coin- 
cident with, GRASIL parameters. This is because different 
combinations of two or more parameters produce identical 
or very similar dust emission SEDs. For instance, different 
combinations of dust to gas ratio 5, MC mass Mmc and 
radius Rmc produce the same MC SED as long as the MC 
optical depth r oc SMmc/Rmc is unchanged (S98, S99). 
The same is true for the shape of the cirrus SED, if all the 
relevant masses (cirrus dust mass and SFR) are varied by a 
factor / and, at the same time, the scale radii by y/J. Also, 
the cirrus, and even more, the MC dust emission depend 
only weakly on details of the spectral shape of the input 
stellar radiation, which means that different combinations 
of SFR(t), Z{t), galactic age Tgai, and MC escape timescale 
tesc (which affects the fraction of starlight heating the MC 
and that heating the cirrus) may well give rise to almost 
identical dust emission in one or both components. 

The output neurons are the values of XLx in the IR 
region, both for molecular clouds and for the cirrus, which 
usually means several hundred output neurons. 



Input neurons for molecular clouds 

Since in all practical cases the MCs are optically thick to the 
primary stellar emission heating them, and they are approx- 
imated by homogeneous spheres with constant density, the 
shape of the emitted IR SED is controlled only by the two 
quantities listed below, which are simply related to GRASIL 
parameters. Accordingly, the adopted input neurons are; 

• tmc oc 5 Mmc / Rmc > f he molecular cloud optical depth 
(conventionally given at 1 ^m). 

• Rmc[pc\/ {kyJ\Li,MCA<i) the ratio of the molecular 
cloud radius over an estimate of the dust sublimation ra- 
dius, i.e. the inner radius of the dust distribution. The lat- 
ter depends on the luminosity of stars within each cloud (in 
10*® erg/s). The constant k depends on the adopted maxi- 
mum temperature Ts for dust. We found (see discussion in 
S98 and S99) that a value of Ts = 400 K, properly rep- 
resents the ma^ximum attainable dust temperature in star 
forming molecular clouds, for which k ~ 12. We explicitly 
note that the value of this constant is irrelevant as long as 
we use the same value when training and using the MLP. In 
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Figure 1. Example of the effects of the controlUng parameters 
of the MC SED. Upper panel. Continuous (black) line: reference 
SED (tmc = 25.4 at Ifim, and iJA/cM/ (l2-\/-L*AfC,46) = 
104). Dotted (blue) line: effect of varying t alone by a factor 
of 2 (obtained by doubling the dust to gas mass ratio). Dashed 
(pink) line: effect of varying the molecular cloud extension alone 
by a factor of 2 (obtained by decreasing the escape time scale, 
i.e. the primary source luminosity within MCs and therefore the 
inner radius, to a value able to provide a factor of 2 increase 
of the neuron). The SEDs are normalized to their own energy 
to highlight the change in shape. Lower panel. Residuals with 
respect to the reference SED. 

other words, any quantity proportional to this ratio would 
be equally good as input neuron. 

In Fig. [T] we highlight the separate effect of the two 
input neurons in the SEDs of molecular clouds. Their varia- 
tion is obtained by changing values of different parameters, 
as detailed in the caption. The optical depth has a strong 
effect in the slope of the MIR and on the depth of the sili- 
cate absorption. In the example depicted, with the starting 
value of ~ 25 at l/im, the IR emission is self absorbed up to 
~ 25/im. By doubling it, it becomes < 1 at ~ 40/im, with a 
much stronger self absorption giving rise to a steeper slope. 
The increase in the extension alone allows colder dust tem- 
peratures in the outskirts of the MCs and therefore emission 
at longer wavelengths. 

Input neurons for Cirrus 

The cirrus emission is defined by 6 neurons for spherical 
symmetry and 9 for discs, listed below. As for mixed bulge 
and disc geometry, we have found that if dust heating is 
dominated by stars in the disc component, as is the case for 
nearby spirals, the pure disc network gives sufficiently accu- 
rate results. But for a general application to mock galaxies 



output by SAMs, a mixed geometry must be available, un- 
less the model explicitly takes into account only spheroidal 
or disc galaxies. As recalled in the Introduction, in this pa- 
per we present the implementation of the ANN for spherical 
or disc geometries, while the application to the mixed ge- 
ometry will be presented in another paper (in preparation). 

• log(Lcir/i*,c), the cirrus dust luminosity normalized 
to the stellar luminosity heating the cirrus. The former can 
obviously be derived without actually computing the dust 
emission, since it equals the stellar energy absorbed by the 
cirrus. This ratio provides a global measure of the amount 
of dust reprocessing. 

• log(A/cir/i*,c), the normalized cirrus dust mass, ex- 
pected to be strongly correlated with the (distribution of) 
dust emitting temperature. 

• Tp and Te, the polar and equatorial optical depths due to 
cirrus alone (integral of the dust density distribution along 
the polar and equatorial directions respectively, convention- 
ally given at 1/im). Only one of the two is used for pure 
bulge geometry, since Te = Tp. 

• Th, a. fictitious optical depth, computed as if cirrus were 
spherically and homogeneously distributed. This dummy 
quantity was already computed by GRASIL, and it is in- 
cluded here because its comparison with Tp and Te provides 
a measure of the "concentration" of the dust distribution in- 
dependently of the specific density law assumed. Of course 
this concentration significantly affects the shape of the emit- 
ted SED, and indeed we empirically found that its inclusion 
improves the performances of the MLP. 

• Geometrical ratios: Dependent on the geometry of 
the galaxy used. For the bulge component: re,*/re,diff which 
measures the "relative position" of dust and stars. For the 
disc component rd,* /rd,dif f , Zd,*/rd,*, Zd,diff /rd,diff ■ Taken 
together these three ratios measure the relative position of 
dust and stars and the geometrical thickness of star and dust 
distributions. 

• Hardness ratio: ratio of the radiation field at 0.3 ^im 
over 1 ^m, heating the cirrus (thus emerging from molecu- 
lar clouds and stars already out of molecular clouds). Since 
small grains and especially PAHs are excited most efi^ec- 
tively by relatively hard UV photons, this quantity is cor- 
related with the ratio between the NIR-MIR emission they 
produce, and the far-IR due to big grains. 

Examples of the effect of the cirrus neurons are shown in 
Fig. [5] A variation in the hardness ratio, also by only ~ 20% 
as in the figure, has an immediate effect in the temperature 
distribution of dust grains with small heat capacity, i.e. very 
small grains and PAHs, therefore it mainly afi'ects the mid- 
IR emission leaving almost unchanged the far-IR. Increasing 
the amount of dust alone has greater effects in the overall 
equilibrium temperature of dust grains, and therefore in the 
position of the peak of the FIR emission, because of a smaller 
photon to dust density ratio. A similar effect in the FIR cou- 
pled to a hotter MIR as in the first case, can be obtained by 
lowering the optical depth of the dust distribution leaving 
unchanged the amount of dust. Indeed in this case, on one 
side a lower concentration of the radiation field relative to 
the dust density yields colder equilibrium temperatures, on 
the other side small grains and PAHs respond to single UV 
photons, which have a longer mean free path with a smaller 
T. The efi'ect of th alone is a modulation with respect to r. 



8 L. Silva et al. 






20 




1 


' ' 1 1 1 





18 


- 




- 





1 6 




* + 







1 4 











12 











10 











08 











06 




1 





10 20 30 40 50 

§ hidden nodes 




0.5 1,0 1.5 2.0 2.5 3.0 

log X [^tm] 

Figure 2. Examples of the effect of the cirrus input neurons. Con- 
tinuous (black) line: reference cirrus SED for a spherical model. 
Dot-dashed (blue) line: hardness ratio increased by ^ 20% (ob- 
tained by avoiding the SSPs with the highest metallicy available 
in our library, so to have somewhat harder stellar intrinsic spec- 
tra). Short dashed (pink) line: increase ^idust by a factor of 2 
(obtained by doubling (5, and the scale radii by \/(2) to leave t 
unchanged). Dotted (green) line; decrease t of cirrus dust by a 
factor 5 (by increasing stars and dust scale radii by the same fac- 
tor to leave their ratio and the dust mass unchanged). Three-dot- 
dashed (violet) line: decrease t/j by a factor 10 (by increasing the 
galaxy radius by y^lO) to leave r essentially unchanged since the 
King profile is quite centrally concentrated). Long-dashed (red) 
line: star to dust scale radii ratio halved (by halving the stellar 
scale radius). 

The shape of the SED has a strong dependence on the star 
to dust scale radii, since this implies a different distribution 
of the radiation field and therefore a redefinition of the tem- 
perature distribution function of dust grains. We note that 
the input neurons we found to work for the ANN are not 
fully independent, in fact a variation of essentially anyone 
of them implies also different amounts of reprocessing. In 
the depicted examples, the reprocessing changes from a few 
percents to ~ 40%. We empirically found that the ANN pro- 
vides a better performance with this additional information. 

3.4 Network training 

The MLPs we use in the following have been trained on 
some thousands CHE_EV04-GRASIL modelfl, either for pure 



^ The training libraries have been efficiently produced us- 
ing G RASIL WEB interface GALSYNTH, accessible through 
|http://adlibitum.oat.ts. astro. it/silva/default.html| 



Figure 3. Mean total error (i.e. averaged over all wave- 
lengths and and all models used for the training) pro- 
vided by the trained network on the normalized (-1 to 
1) fluxes of the veriflcation sets, as a function of the num- 
ber of neurons in the hidden layer, for molecular clouds 
(asterisks) and cirrus (crosses). 

spheroids or pure discs, covering generously the range of pa- 
rameter values used in several of our past works. Actually, 
the definition of how large the range must be is non trivial, 
since the properties of the mock galaxies calculated in simu- 
lations of galaxy formation are not predictable a priori, nor 
are those of high-z galaxies in the real universe. 

For the applications shown in the next Section, 
the range of values of the input neurons used for the 
training is very large, this is particularly required for 
the ABC galaxy model, characterized by the pres- 
ence of extreme phases of the evolution of the SFR 
(Sec. mH). Specifically: 

• TMC = 1 to 70; 

• RMc/Rm^n = 18 to 3000; 

• log(Lcir/i*,c) = -2 to -0.02; 

• log(Mcir/I/*,c) = -8 to -6; 

• Te = 0.01 to 100; 

• Tp = 0.007 to 70; 

• Th = 8e-4 to 8; 

• rc,,/rc,diff and rd,*/rd,diff = 0.2 to 5; 

• Zd,*/rd,* and Zd,diff /rd,diff = 0.02 to 0.3; 

• log(L*,c(0.3)/L,„c(l)) = -0.6 to 0.8. 

We trained the net with 90% of the models, ran- 
domly chosen within the library generated with the 
aforementioned range of parameters, and using the 
remaining 10% as a verification set. The training 
procedure does not (in this application) change the 
structure of the NN. We have empirically adjusted the 
number of neurons in the hidden layer Uhid- As shown in 
Fig. [Sj the error provided by the network on the 
verification sets decreases with increasing number of 
nodes down to a minimum, then it remains substan- 
tially flat unless the number becomes very large. In 
other words, the increase after the minimum is very 
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shallow if any, and then the minimum point is not 
very well defined. Guided by this and similar plots 
on different test set selections, in the applications 
shown below we have adopted rihid = 20 for molecu- 
lar clouds and 35 for cirrus. We trained the MLP using 
500 training epochs (iterations) with a learning rate of 0.001. 
Also these choices are not very critical for the final results. 

An important expected advantage of the ANN tech- 
nique with respect to classical interpolations is the capabil- 
ity to "learn" the effect of each single input neuron on the 
SED, mimicking in some sense the skill that a real GRASIL 
user develops with experience. Therefore we may expect that 
the MLP can produce a correct SED, corresponding to a 
given choice of input neurons, even when the trained set 
does not include examples with the entire set of inputs neu- 
rons bracketing the required ones at the same time. It is 
normally sufficient that each single input neuron is indepen- 
dently within the values included for the training. Indeed, 
the general performance we have experienced confirms our 
expectations. On the other hand, we expect and we found 
that the ANN often fail catastrophically, when one or more 
parameters are not within the range of values of the training 
set. For a given input galaxy model, it is therefore 
fundamental to check if the training encloses all the 
values of the input neurons. Actually, to prevent the 
general user from an improper use of the ANN, we 
have implemented in GRASIL a check to force the full 
radiative transfer computation whenever the previ- 
ous condition is not met. 

Note that the ANN adopted in the following has been 
trained on models computed with given intrinsic dust prop- 
erties, specifically properties compliant with the av- 
erage Milky Way-type dust (size distributions, relative 
abundances of graphite and silicates, PAH abundance, slope 
of the dust emissivity in the sub-mm etc, see S98). The full 
GRASIL model has the freedom to modify these quantities, 
even if in the standard use this is not usually exploited, 
mainly because of their extremely poor knowledge. 
To do that a suitable trained network has to be 
built with the chosen intrinsic dust properties, and 
the same input neurons described above. This is far 
more convenient than including also these properties 
as neurons, because of their large number, and also 
because of little use for semi-analytical galaxy mod- 
els whose range of predictions do not reach such de- 
tails. This would serve to test specific requirements. 



3.5 Computing performance 

The implementation of the ANN into GRASIL dramatically 
reduces the CPU time required to run the code. As recalled 
above, with a ~ 2 GHz CPU a single run could take any- 
where up to ~ 10 minutes to calculate a SED, depending 
on the geometry. With the use of an ANN this time is re- 
duced to just a few seconds, with the main CPU time taken 
up by the processes which are not calculated by the ANN; 
a CPU gain of more than 2 orders of magnitude. Such a 
remarkable reduction in computing costs should make pos- 
sible an efficient comparison of the SEDs of SAMs to large 
observational galaxy surveys with a proper dust treatment. 
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Figure 4. M82: Original vs ANN and residuals. Upper panel: 
Black continuous is total original, red dashed is total ANN; dot- 
dashed black and green is for molecular clouds; dotted black and 
blue is cirrus. Lower panel; total residual 
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Figure 5. NGC6090: Original vs ANN and residual 

4 APPLICATIONS 

4.1 Examples with single SEDs 

In Figs.|4]to[9]we show examples of comparisons between the 
SEDs directly computed with GRASIL with those estimated 
with the ANN. These examples comprise model fits to the 
real and well defined SEDs of galaxies in different evolu- 
tionary states, which are commonly used as benchmarks for 
models of dusty galaxies. The specific set of parameters of 
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Figure 6. ARP220: Original vs ANN and residual 



Figure 8. MlOO: Original vs ANN and residual 
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Figure 7. M51: Original vs ANN and residuals 



these models were not included in the training set. Therefore 
the trained MLPs perform well enough for most purposes, 
such as fast exploration of parameter space. Given the small 
amount of CPU time to calculate a single SED using the 
ANN, it will be possible to employ techniques of automatic 
optimization for model parameters with suitable programs 
(e.g. MRQMIN in Press et al. 1996). 
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Figure 9. NGC6946: Original vs ANN and residual 

4.2 Application to the ABC semi-analytical model 

As already mentioned in the Introduction, perhaps the most 
obvious application in which a significantly quicker way to 
estimate a GRASIL SED is a considerable advantage is when 
this model is used in combination with SAMs. In this case, 
to test these models against observations such as luminos- 
ity functions, number counts and galaxy scaling relations, 
GRASIL has to be run for at least a few thousand mock 
galaxies, a quite demanding computing task. On the other 
hand, since many of these observables are integrated quan- 
tities, reasonably small inaccuracies in the computation of 



each single SED, without systematics, are acceptable. In 
this Section we demonstrate that our trained MLPs meet 
this practical request, showing some applications with the 
ABC model (Granato et al. 2004) for the co-evolution of 
spheroids and QSO, requiring only a spherical geometry. ABC 
is a simple, yet quite successful, semi-analytical model orig- 
inally developed to provide an interpretation for sub-mm 
selected galaxies and their possible descendants, the local 
massive spheroidal galaxies, accounting in particular for the 
growth by accretion of a central super massive black hole 
and its feedback on the host galaxy. The general behav- 
ior of the evolution of the mock galaxies envisaged by the 
model is characterized by a strong and relatively short dust- 
enshrouded SF phase during which a central SMBH grows, 
a QSO phase halting subsequent star formation, and then 
essentially passive evolution. We refer to the original papers 
for more details (G04; Granato et al. 2006; Silva et al. 2005; 
Lapi et al. 2006). 

Here, to exemplify the effectiveness of our ANN com- 
puted SEDs in the process of SAM validation, we show the 
expected number density of the proto-spheroids output by 
ABC in the PACS and SPIRE Herschel imaging bands, by 
computing the SED of each mock galaxy at all phases (i.e. 
redshift slices), either with the full code or the ANN, and 
compare the results. In Fig. [10] we show examples of ran- 
domly extracted SEDs from the ABC model. The original 
and ANN SEDs are often very difhcult to distinguish in 
these plots. A systematic comparison is in Fig. 1111 show- 
ing the residuals between the original and the ANN SEDs 
vs wavelength for ~ 400 objects extracted from ABC galaxy 
catalogues at various redshifts between 2 and 6. We show the 
median and the 0.1 — 0.9 and the 0.05 — 0.95 percentiles. It is 
worth to point out that the typical error introduced 
by the use of the ANN is less than ~ 10%, meaning 
that it is likely dominated by uncertainties if the 
adopted GRASIL physics, by the simplified geometry 
and by the numerical approximations. However, a 
detailed comparison between these contributions to 
the total uncertainty in the model is outside the 
scope of this paper, while our point here is to inves- 
tigate the capability of the ANN to avoid the time 
consuming GRASIL computations. 

Integrated quantities such as luminosity functions and 
number counts are more accurately reproduced than sin- 
gle SEDs, since small differences in the SEDs tend to be 
smoothed out. To illustrate this, in Fig. [Inland [TH] we show 
the integral galaxy counts in the PACS and SPIRE Her- 
schel bands at 70, 100, 160, 250, 350 and 500 ^.m for the 
ABC model, obtained with both the full computed SEDs and 
with the ANN quick estimate. The curves can hardly be 
distinguished, so that the latter is fully adequate to com- 
pare model predictions with available and forthcoming data. 
Data for differential number counts are available at 250, 350 
and 500 ^m wavelengths, obtained with the balloon-borne 
BLAST telescope (Devlin et al. 2009) and very recently with 
HerschelSVYKEj. In Fig. [14] we compare the counts by the 
ABC model, as obtained with the full and the ANN com- 
putation for the SEDs, and we also compare with BLAST 
data by Patanchon et al. (2009, triangles) and Bethermin 
et al. (2010, asterisks), and with SPIRE data by Clements 
et al. (2010, squares) and Oliver et al. (2010, diamonds). 
In addition to the forming spheroids, we have included an 



Modelling SEDs with ANN 11 




10 100 1000 

Figure 11. Residuals as a function of wavelength between origi- 
nal and ANN SEDs, for a sample of about 400 dusty mock galax- 
ies at various redshifts between 2 and 6, generated by the ABC 
spheroid-QSO co-evolution model (Granato et al. 2004). Continu- 
ous (purple) line: median; Dashed (red) lines: 10-90% percentiles; 
Dot-dashed (green) lines: 5-95% percentiles. 
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Figure 12. Spheroidal integral galaxy number counts in the three 
Herschel PACS bands at 70, 100 and 160/xm, as predicted by 
the ABC SAM (Granato et al. 2004): comparison between counts 
obtained with the full computation for the SED (dot-dashed violet 
line) , and the ANN reconstruction (continuous blue line) . The two 
lines are almost superimposed. 
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Figure 10. Examples of randomly extracted galaxy models from the ABC SAM. The SEDs obtained either with the full GRASIL (dashed 
lines) or the ANN reconstruction (solid lines) are almost superimposed. 



empirical estimate for the contribution by late-type galaxies 
(spiral and starbursts) from Silva et al. (2004). The num- 
ber density of proto-spheroids appears consistent, although 
quite high particularly at 250 and 350 fim compared with 
the available data. A deeper investigation would require to 
test the effects of different dust properties and star-dust dis- 
tributions on the predicted counts, and the implementation 
of a fast algorithm for the SEDs allows to easily perform 
this task, moreover by taking into account the effects on the 
full wavelength range. A discussion on the interpretation of 
galaxy counts is beyond the scope of this paper, and will be 
presented in another paper with a more general SAM. 

5 CONCLUSIONS 

We have presented the implementation of an artificial neural 
network algorithm to compute the SEDs with the GRASIL 



code. The main aim is to have a reliable radiative transfer 
computation for the theoretical SEDs and a short computing 
time, sufficiently short to be applied to cosmological volumes 
populated by semi-analytical galaxy formation models. But 
of course this opens the possibility of fast exploration of 
parameters to fit data. The main points of the paper are 
listed in the following: 

• SEDs are complex and non linear functions of many 
galaxy properties resulting from their star formation and 
assembly histories, such as the age and metallicity distribu- 
tion of the stars, the amount and composition of gas and 
dust, the relative distribution of dust and stars etc. A ra- 
diative transfer computation of the stellar radiation field 
through the dust distribution to get the extincted stellar 
and dust emission spectrum is a time-consuming task. The 
required time becomes prohibitive in particular for appli- 
cations involving simulations of cosmological galaxy cata- 
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Figure 13. Spheroidal integral galaxy number counts in the three 
Herschel SPIRE bands at 250, 350 and 500;im. Meaning of lines 
as in Fig. [12] 

logues with semi-analytical galaxy formation models, requir- 
ing thousands of mock galaxies at each redshift slice. On 
the other hand, in order to exploit as much as possible all 
available data to constrain models, it would preferable to 
maintain the possibility to assign to each galaxy a SED that 
as much as possible reflects its effective properties instead 
of relying on pre-defined templates that may have nothing 
to do with the galaxy configuration. 

• ANN are tools particularly suited to approximate com- 
plex non-linear functions. We have implemented a standard 
feed-forward back-propagation ANN into the GRASIL model. 
The main characteristics of this model were defined by the 
requirements of having a relatively realistic representation of 
galaxies (in particular by accounting for a two-phase dusty 
medium heated by stars of different ages and for the tem- 
perature distribution of the dust) and an acceptable (for 
many applications) computing time. The real bottleneck to 
get the SED is the computation of the dust emission spec- 
trum, since it requires the computation of the distribution 
of the radiation field at each point in the galaxy, and the 
ensuing dust temperature for each type of grain. Therefore 
we have implemented the ANN to compute the dust emis- 
sion spectra, and separately for the star forming molecular 
clouds and the diffuse medium due to their different prop- 
erties. The gain in computing time is more than 2 orders of 
magnitude. 

• To implement the ANN we have (a) identified the quan- 
tities (input neurons) that effectively control the shape of 
the dust emission SED from the two dusty components, and 
(b) trained the network with a large set of pre-computed 
models covering a large range of values of the input neu- 
rons. The input neurons are two for MCs (optical depth and 
ratio of outer to sublimation radius), and 7 or 9 for cirrus 
emission, for spherical or disc geometry respectively (optical 
depths, hardness of the radiation field, ratios of star to dust 
scale radii, mass and bolometric luminosity of the diffuse 
dust). The network is meant to be of general use, because by 
construction it is independent of the specifics of the galaxy 
formation model in use, the quantities that effectively deter- 
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Figure 14. Differential galaxy counts normalized to Euclidean 
at 250, 350 and 500/im. The 3 dot-dashed (violet) and contin- 
uous (blue) curves are the star forming spheroids from the G04 
model with the full and ANN computation of the SEDs, respec- 
tively. The dashed (red) curve is the contribution from late type 
galaxies (spirals and starbursts). The total is the black continu- 
ous line. BLAST counts are by Patanchon et al. (2009, triangles) 
and Bethermin et al. (2010, asterisks). HerschelSVYRE counts 
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mine the shape of the dust emission spectrum are extracted 
from the input star formation histories and used as input 
neurons. This is complementary to the work by Almeida et 
al. (2010), where an ANN has been implemented specifically 
for the combined GALFORM+GRASIL model. In this case the 
ANN is meant to give the full SED and the method is yet 
faster than the one presented here. 

• We have tested the computation of the SEDs with the 
ANN with single SEDs and with a simple semi-analytical 
model. In this first paper the ANN has been implemented 
for pure spherical or pure disc geometries. The mixed 
bulge+disc geometry will be presented and applied in an- 
other paper. We have compared the full and the ANN 
computation for model SEDs that fit nearby well-observed 
starburst and disc galaxies. We have then made the same 
comparison for SEDs and galaxy counts for the ABC semi- 
analytical model by Granato et al. (2004) for the joint forma- 
tion of spheroids and QSOs. The ANN appears to perform 
well in all the explored cases, which cover star formation his- 
tories ranging from relatively quiescent spirals, to extreme 
dust-enshrouded starbursts. It is also to be noted that small 
inaccuracies in the SEDs are smoothed out when comput- 
ing integrated quantities such as LPs and galaxy counts. As 
for the latter, we have shown that the counts in the PACS 
and SPIRE Herschel imaging bands for the ABC model, as 
obtained with the full and ANN computation are almost 
superimposed. This means that a thorough exploration of 
the effects of different assumptions on the dust properties, 
not output by the galaxy formation model but that must 
be assumed for the SEDs, can easily be performed. A dis- 
cussion on the implication of these counts for the Herschel 
surveys is beyond the scope of this paper, and will be dis- 
cussed elsewhere. The computation of SEDs with the ANN 
method appears robust and computationally advantageous 
to analyze and test galaxy formation models in cosmological 
volumes. 
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